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Abstract 



We consider a number of proposals for the entropy of sets of classical 
coarse-grained histories based on the procedures of Jaynes, and prové a 
series of inequalities relating these measures. We then examine these as 
a function of the coarse-graining for various classical systems, and show 
explicitly that the entropy is minimized by the finest-grained descrip- 
tion of a set of histories. We propose an extension of the second law of 
thermodynamics to the entropy of histories. We briefly discuss the im- 
plications for decoherent or consistent history formulations of quantum 
mechanics. 
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I. INTRODUCTION 



Entropies are measures of the information missing from a coarse-grained descrip- 
tion of a system. Different coarse-grained descriptions give rise to different entropies. 
If an entropy is low at one time, it will have a general tendency to grow as its coarse- 
graining is translated forward in time. That is the second law of thermodynamics. 

Usually, entropy is constructed from a coarse-grained description at a single mo- 
ment in time. For example, if all that is known of the state of a system at a particular 
time is its total energy, the missing information is the entropy of the microcanonical 
ensemble — a quantity which is independent of time. If all that is known at a time 
are the expected vàlues of the energy, number, and momentum densities, averaged 
over volumes large enough to be in local equilibrium, then the missing information 
per volume is the time-dependent entropy density of hydrodynamics. 

The Jaynes procedure |Ï|J^ gives a general method for constructing the entropy of 
a system at a moment of time. To illustrate, let M be the phase space of a classical 
system and p{x), x G M be the probability distribution representing the state of the 
system. Suppose A{x) is a classical quantity whose expected value {A) is known, 
where 



The missing information S is constructed by maximizing the entropy functional 







over all p{x) that imply the same expected value. In symbols. 



S = max S{p) |(A>^ = {A), 
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However, entropy need not only apply to coarse-grained alternatives at one mo- 
ment in time. More generally, one can consider the missing information of a sequence 
of alternatives at a succession of times. Tliese are the entropies of coarse-grained his- 
tories of the system. A variety of such entropies have been described {e.g. and 
applied to measures of coarse-graining [Q, classicality 0, and effective complexity 
10. In theories which possess a notion of history but lack a fixed notion of time (such 
as certain formulations of quantum gravity |§) the missing information of histories 
may be the only notion of entropy available. 

In this paper we examine the entropy of histories for classical stochastic systems 
— classical systems with a probabilistic law of evolution (including deterministic 
evolution as a special case). We use Isham's history space [|Ï370] and a generalization 
of the Jaynes procedure to give a unified view of several different kinds of entropies for 
histories and describe relations among them. We illustrate with numerical calculations 
in some simple examples. Finally, we describe a modest generalization of the second 
law of thermodynamics applicable to the entropy of histories and test it in a simple 
model. Our considerations are almost entirely classical, but in Section V we point 
the way to generalizations for the quantum mechanical case. 
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II. ENTROPIES OF HISTORIES 
A. Histories and History Space 



We consider classical theories with (most generally) a stochastic evolution law in 
a space M through a finest-grained net of times separated by equal intervals rj. 
For this discussion the space M could be a configuration space of particle positions, 
a spatial lattice, or a phase space. We denote a point in M by x. 

The cases when M is a discrete space or a continuous manifold differ only formally, 
and can to a large extent be treated together by using a common notation. We define 

Tr/(x)= /(^) (2·la) 

when M is discrete, and 

Tifix) = í dx fix) (2.1b) 
Jm 

when M is continuous. This notation is suggestive for the quantum mechanical case 
to be treated later. We also define 

V = Tr(J), (2.1c) 

where / is the unit function on M. V is an integer when M is discrete and a real 
number when M is continuous. 

A fine-grained history is described by a sequence oí xa, A = 1, • ■ - , for each of 
the finest-grained net of times. Histories are therefore naturally thought of as living in 
a classical "history space" M = M x ■ ■ ■ x M, with one factor for each fine-grained 
time. A point in M is denoted by x, and corresponds to a fine-grained history. This 
is the classical analog of the "history space" introduced by Isham [|ÏÜ|, and used so 
effectively in quantum theory by Isham and Linden [|n|,^], and Isham, Linden and 



Schreckenberg [12 
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A coarse-grainedQ set of alternative histories is a partition of the set M of fine- 
grained histories into an exhaustive set of mutually exclusive regions or classes Cq. 
Each class is a single coarse-grained history. We can usefuUy introduce projections 
onto these regions of M, 

A sequence of coarse-grained alternatives at a series of times íi, ■ ■ ■, t„ is an 
example of a coarse-grained history. Suppose the alternatives at time tk are whether 
X is in one of a set of regions of M, {^a^} ? ctfc = 1; 2, . . ., with volumes V^^. We 
introduce projections on these regions of M, 

^Í(a;) = |j A^^ ^ (2.3) 

which satisfy 

PÍ(x)P4(x) = 5^,^,^ P'^^ix) (2.4) 

and 

(pL) = < . (2.5) 

In this case, a coarse-grained history is a particular sequence of regions 
a = {an, . . . , «i) and corresponds to a projection on M of the form 

P„ = / X ■ ■ ■ X P„"^ X / X ■ ■ ■ X X • ■ ■ X J . (2.6) 

That is, Pq, is the projection on M with projections P^^ inserted at the times and 
Ps at all other times. In the discrete case, the most general projection Fq can always 
be written as a sum of such chains: 



^The notion of coarse-graining has many specific applications in physics. An anonymous 
referee suggested as a convenient reference to some of these. 
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P,(x) = Yl Pa„...a.(x), (2.7) 

which allows the construction of a narrative for each coarse-grained history. For 
histories of the form ( ^.61 ) it would read: "the system was in A^^ at ti, then A^^ at 
Í2, • • • " In the continuum case there is a corresponding integraL 

We assume that there is a probabihty law for the fine-grained histories, that is, a 
probabihty function W(x) on M. W(x) satisfies 

W(x) > , and Tr(W) = 1 . (2.8) 

Of course, W(x) may have special forms in particular circumstances. For example, 
for a Markov process 

W(x) = Pr,{xN I Xn~i) Pr,{xN~l \ XN-2) ' ' " Pri{x2 \ Xi) p{Xq) , (2.9) 

where p(xo) is the distribution at the initial time and Pn{x\y) is the transition prob- 
abihty to arrive at x in a time r] having started from y. If M were a classical phase 
space, deterministic evolution would be represented by (^l9| ) with 

Pr^{x\y) = 5{x-Xr,{y)), (2.10) 

where Xn{y) is the phase-space point y evolved by the time rj. 
The probabihty of a coarse-grained history is 

p„ = Tr(P,W) . (2.11) 

For example, in the case of a sequence of alternatives like ( p.3|) at a series of times, 
and a Markovian probabihty of the form ( |2.9D , 

Pan--ai J dXn • • • J dXi P^^^i^X^) p{XnÍn \ Xn—\tn~l) -Pa„_i ('^"— l) 

■■■-fi (3^1) p{xiti I xqÍo) p{xq) 

= j dXoCa„-aiiXo) p{xo) . (2.12) 

Here p{x't' \ xt) is the composition of all the p,,'s from t to t', íq is the initial time, 
and Ca^...a^{xç)) is defined to be the probabihty of a coarse-grained history ai ■ ■ ■ a„ 
given that the system is initially at xq. 
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B. The Entropy of Histories 



The Jaynes construction may now be applied in history space to give an entropy 
for histories. We introduce the entropy functionalQ 

S{W) = - Il·(W log2 W) . (2.13) 

(Unable to introduce a bold-faced calhgraphic S, we rely on the argument of S to 
distinguish this definition from ( |1.2D ). 

The history space entropy Shs{{ca}) of a set of coarse-grained alternative histories 
is then 

5^,({c„}) = niax5(W) l^^^p^-^^^^^p^^^ . (2.14) 



w 



In words, Shs maximizes the missing information S over all probability distributions 
W on M that reproduce the probabilities of the coarse-grained histories {cq} foUowing 
from W. 

The important property of Shs{{ca}) is that it increases on coarse-graining. Specif- 
ically, suppose {cq} is a coarse-graining of the set {cq,}. That means that {ca} is a 
partition of the {cq,} into larger classes, and 

Ca = U_Ca . (2.15) 

Then, as with any Jaynes type construction, 

Shs{{Ca}) < SU{ca} . (2.16) 



The proof is immediate from ( p.l4|) . The constraints for {cq,} contain those for {cq} 



but there are more of them. The maximum therefore can only be less. 



In the continuous case, where W is probability density, rather than a probability, eqn. 
( |2.13D is not generally invariant under dimensional transformations. However, (2.13) is a 
Standard definition. Dimensionally invariant quantities may be obtained by appropriate sub- 
tractions, e.g. — log2 V'^ , or better, by rescaling the coordinates so they are dimensionless. 
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Since the Pq,(x) are mutually exclusive projections, an expression for Shs can be 
derived by carrying out the maximization using Lagrange multipliers to enforce the 
constraints. The result is 

Shs{{ca}) = -X^PalogaPa + 5IP"log2Tr(Pa) . (2.17) 

The maximum value of Shs occurs for the coarsest-grained set of histories set where 
the only history with nonzero probabihty is / x ■ ■ ■ x /. The maximum is 

S^^- = Nhg^V . (2.18) 

The minimum (which occurs for completely fine-grained histories) is zero. 
Another useful quantity is the Lloyd-Pagels (LP) depth defined as 

= EP"log2p, - EP-log2 [Tr(P,)/Tr(I)] . (2.19) 

a OL 

This has a number of useful features. It is a direct measure of the information in a 
set of histories; it is invariant under dimensiona! transformations; and it is invariant 
under refinement of the fine-grained net of times. 

To illustrate, consider the entropy of a history consisting of a set of ahernatives 
{Pq,} at a single moment of time t. This is 

SKs{{Pa\) = -Y.VoL l0g2Pa + E ^"^^2 [Tr(Pa)] + (iV - 1) logs V . (2.20) 



This is the entropy that would be obtained from the usual Jaynes construction (|1.3|) 
with the addition of the constant (A^ — 1) logg V representing the missing information 
at all the other moments of time. By contrast, the depth 

Ï^Lp({Pa}) =EP"log2 P"- Ep- log2[Tr(Pa)/Tr(I)] (2.21) 

is the same as the —S that would be calculated from ( |1.3|) , without extra terms. Note 
that if we use a dimensionally invariant form Shs-, by subtracting a term log2 V^^ as 
suggested above, we would have the simple relationship 

^h.({c4) - log2 = -I?;p({c4). (2.22) 
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C. Other Entropies of Histories 



The history space entropy is not the only information measure that can be asso- 
ciated with histories. In the foUowing we discuss some others and the relationships 
between them. 

Isham and Linden's Entropies. 

In their seminal paper on entropy in generalized quantum theory [Q, Isham and 
Linden utilize history space to define a one parameter family of entropies based on 
the decoherence functional D{a,a') for a decoherent set of coarse-grained histories. 
Translated into the notation of this paper their definition reads: 

4({c„}) = -EP"log2P« + a;EP"log2[Tr(Pa)/Tr(I)] . (2.23) 

a a 

As they show explicitly, for x > 1 the entropy Ix{{ca}) possesses the important 
property that it increases under coarse-graining of the decoherent set. 

As discussed by Isham and Linden, in the case of non-relativistic quantum me- 
chanics, history space is a repeated tensor product of the Hilbert space of the system 
— one factor for each time. The Pq, are projections on this space and the trace 
is defined as usual. However, their arguments can be immediately applied to the 
classical situations we have been discussing. (We shall return to the quantum me- 
chanical case in Section V.) Indeed, any classical problem can be considered as a 
generalized quantum theory in which all sets of alternative histories decohere auto- 
matically: D{a,a') = p{a)6aa'· The expression (|2.23|) thus applies immediately in 



the classical case. The history space entropy Shs we arrived at from the Jaynes con- 
struction corresponds to x = 1, up to a possible overall renormalization. Isham and 
Linden mainly consider x = 2, but that should not obscure the fact that our history 
space entropy, defined through a Jaynes construction, is a special case of those that 
they consider. In Section V we will provide a Jaynes construction for this entropy in 
quantum mechanics. 



9 



Step-by-step entropy. 

Consider the special case where the set of coarse-grained histories consists of a 
sequence of sets of coarse-grained alternatives at a series of times ti, ■ ■ ■ , tn- For 
generality we assume that these sets are branch dependent, that is, the sets at time tk 
may depend on the specific choice of sets at previous times ti ■ ■ ■ tfe_i. The projections 
at time then have the form 

P„\K_i, «x) . (2.24) 

At any stage in the sequence k = 1, ■ ■ ■ , n, one can construct the Jaynes en- 
tropy of the set of ahernatives {P^k} conditional on a particular previous history 
«fc-i, ■ • ■ , «1. This is, following ( p.20D , 



+ IIPafcl«fc-i,···,«ilog2{Tr («fe-i'---'»!)]} • (2-25) 

where Pak\ak_i,- -,ai is the conditional probability for ak given the previous history 
«fc-i, ■ ■ ■ , «1. In terms of joint probabilities this is 

= -^^^^ • (2.26) 

Average the conditioned entropies ( p.25| ) over past histories weighted by their 
probabilities and sum over all the steps from one to n to obtain the step-by-step 
entropy 5.,. ({P„"J,···,{P„\}): 

n 

Ssts ({CJ, ■ ■ ■ , {Pi}) = E E Pc.,..-a.S, {{PtJ I ■■■ a^) . (2.27) 

k=l 

A little àlgebra using ( p.26| ) and ( p.27[ ) is enough to show that, for the case of sets 



of alternatives at a series of times, the step-by-step entropy and the history space 
entropy are related by 

ShsiM) = Ssbs {{P:J, ■ ■ ■ , {Pi}) + iN-n) log2 V . (2.28) 
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That is, they are identical except for a constant factor that is the missing information 
at the times not specified. Evidently, Sgbs < Shs for the coarse-grained sets for which 
they are both defined. 



D. Dynamically Constrained History Space Entropy 



In constructing of the history space entropy Shs, the entropy functional ( |2.13|) is 



maximized over all probability functions W(x) irrespective of whether they conform 
to the same bàsic dynamical law. Shs is thus the missing information in histories 
assuming we are also missing any information about the dynamics. 

A dynamical law could be enforced by maximizing 5(W) only over the W that 
conform to it. For example, by maximizing over the form ( p.9|) , keeping Pri{x\y) fixed, 
we enforce a particular Markovian dynamical law. The resulting entropy Sdc{{ca}) 
we call the dynamically constrained history space entropy. The maximum in ( p.l4|) is 
carried out only over the initial distribution p{x) with W(a;) determined by enforcing 
the subsequent dynamics explicitly. Evidently, since this is a constrained maximum, 

Sdc{{Ca}) < Shs{{Ca}) • (2.29) 

The entropy Sdci{ca}) is connected to another entropy of histories obtained by 
applying the Jaynes method used in ( |1.3| ) to the initial p{x), but constraining the 
maximum not simply by the requirement that probabilities at one time are repro- 
duced, but probabilities of a whole set of histories. We call this the initial condition 
entropy Sic{{ca}). 

We can illustrate the construction of Sic in the case of Markovian evolution and 
a set of histories that is a sequence of sets of alternatives {-Pq^.} at a series of times 
tk, k = 1, ■ ■ ■ , n. The probabilities of these histories are given by ( P^.12| ) which we 
may conveniently write as 

Pa„ ... ai = Tr {Ca„-a,p) (2.30) 
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that is, the sum or integral of p{xq) with the functions Ca„ ... aii^o) defined by ( |2.12| ). 
We can now carry out the Jaynes construction 



Sic{{ca}) = max S{p) 



lTrC„ ... c«iP =Tr Cc„ ... ciP 



(2.31) 



for all the histories. The density function which realizes this maximum has the form 



p(x) = exp 



ai 



(2.32) 



Oín ■■■ Oíl 

where the Lagrange multipliers A"" are determined by the conditions 



Tr (Cc,„ ... a^p) = Tr (C„„...„,p) . 



(2.33) 



The Ca„ ... ai{x) are not projections, and there seems no easy way to evaluate 
(|2.32|) and ( p.33|) explicitly in general. However, Sdc and Shs supply upper bounds on 
Sic as we shall now show. 

Write out the entropy functional (|2.13|) for W of the form (|2.9| ) to find after a 
little àlgebra 



S{p) = S{W) - j dxo s{xo) pÍxo) 



where S(W) is ( P7[B[ ) for W of the form (^, and 



(2.34) 



S(p) = - Tr(p log2 p) 



(2.35) 



The entropy s{xo) is defined by 



s{Xo) = - j dXn ■■■ dXi p{Xn ' ' ' Xq) loga p{Xn ' ' ' Xq) 



where 



(2.36) 



p{Xn, ■ ■ ■ , Xo) = p{Xntn \ X„_it„_i) ■ ■ ■ p(Xiti | Xoto) 

The function s{xo) is always positive. Thus 

S{p) < S{W) 
12 



(2.37) 



(2.38) 



for W of the form ( |2.9| ). [Note that the Markovian form of the dynamics in ( |2.37|) 
is not important; any probability function p{xn, ■ ■ ■ ,Xo) satisfies this inequality.] On 
maximization over p we have the inequalities 

Sic < Sdc < Shs ■ (2.39) 

In particular, if on fine-graining Shs is driven to a low value, then Sdc and Stc will be 
as well. We shall use this in what foUows. 

III. BEHAVIOR OF HISTORY SPACE ENTROPY UNDER 

FINE-GRAINING 

Entropies decrease under fine-graining and increase under coarse-graining. That 
immediately follows from the Jaynes construction as the discussion leading to ( |2.16|) 



shows. Usually this well-known behavior is considered for variations in levels of coarse- 
graining at a given moment of time. However, histories can also be fine-grained in 
time. For example, if a set of histories is specified by one set of alternatives at a series 
of times, and another set of histories by the same alternatives at more times, then 
the second set is a fine-graining of the first. 

In this section we examine explicitly the behavior of history space entropy un- 
der fine-graining in three one-dimensional models with simple stochastic evolutionary 
laws. They are: a discrete random walk, continuous diffusion, and Brownian motion. 
The random walk is the simplest model; diffusion illustrates the modifications neces- 
sary for the continuum; and Brownian motion is a simple example of a non-Markovian 
process. In all cases we consider a finest-grained net of N equally-spaced times so 
that fine-grained histories are specified by positions (xi, ■ ■ ■ , xjv). We consider 
coarse-grainings in which these positions are grouped into equal intervals of size Ax 
at a series of times spaced by equal intervals At. We then study history space entropy 
for these coarse-grainings as a function of Ax and At. 
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A. Random Walk 



We take an initial condition where all histories begin at tlie initial point xq = 
and assume that at each timestep the particle has an equal chance of moving right 
(x ^ X + 1) or left (x ^ X — 1) on a discrete spatial lattice. There are then 2^ fine- 
grained histories with equal probability 1/2^ and all other histories have probability 0. 
We assume that the lattice has a large finite size V with periòdic boundary conditions 
relating its ends. The history space entropy is given by ( |2.17| ) where Tr(Po) is the 
number of fine-grained histories in a coarse-grained history Ca- For all histories with 
the coarse-graining described above this is: 



Simple as this is, it is clear that as the number of fine-grained histories increases 
rapidly with the number of times n = N/At, and calculating entropies by summing 
over all the fine-grained histories in each coarse-grained history rapidly becomes im- 
practical. Instead, we use a Monte Cario approach: we generate a large sample of 
fine-grained histories, bin them together into coarse-grained classes, and calculate 
the entropies from the resulting probability estimates. This technique works in the 
continuous case as well. 

In Figure 1 we plot the history space entropy Shs of the random walk model as a 
function of the Ax and At. We clearly see that the entropy rises steeply when the 
coarse-graining is increased by increasing At and more moderately as Ax is increased. 
Increasing Ax to F at a íixed time gives the maximal coarse-graining where the only 
alternatives are (/,0). Therefore, increasing Ax to V for any fixed value of Aí will 
give the maximum possible entropy, associated with the alternative P = I. That is, 
from ( |3.1| ) (the —Y^plog2 p term vanishes). 



Tr(: 



:P„) = (Aa;)^/AV^(l-l/Aí) _ 




hs 



'max 



N log2 V . 
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Figure 1: History space entropy, Shs, for the discrete random walk as a 
function of coarse-graining scalcs Ax and At. In this system, a particle 
begins at x — on a ID lattice of 256 points and moves left or right by 1 
position with equal probability at each of N — 128 times. The entropy is 
measured in bits of missing information. These results were produced by 
a Monte Cario simulation with 100,000 random trajectòries; because of 
the ràpid rise in Shs with coarse-graining, the Ax and At axes are plotted 
on a logarithmic scale. 
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Increasing At for fixed Ax gives a closely related limit. When Aí is at its maximum 
value of N, Shs is the single time entropy plus (A^ — 1) log2 V [cf. (|2.28|) ]. The single 



time entropy ranges from log2 2 for Ax = 1 to log2 V for Ax = V. Thus, for 
large N we expect Shs to be essentially S^j^^^^ and that behavior is also illustrated in 
Figure 1. 

The maximum value of Shs for the particular model simulated is N log2 V, which 
in this case is 1024 bits. This is reflected on the plot. At the other extreme, the 
minimum entropy occurs for Aí = Ax = 1, and is Shs = 128 bits. The finest graining 
included in Figure 1 is Aí = Ax = 2, and we see that Shs has already risen steeply 
at that point. 

B. Continuous DifFusion 

A Markovian diffusion process illustrates the case when M is a continuous space. 
Take the transition probability to be 

p{x2,t2\xi,ti) = \ exp[-(x2 - xi)V-PAí] , (3.3) 
V vriJAí 

where D is a diffusion constant and Aí = Í2 — íi. Assume a finite range size V, 
divided into celis of size Ax, and a total duration for the histories of í/ = A^Aí. 



Choose V ^ yDtf, so that we needn't worry about boundaries. We again assume 
an initial condition where the particle is initially at Xq. 

Label the intervals of the spatial coarse-graining by an integer i, a point lying in 
the i^^ celi if iAx < x < {i + l)Ax. The probability of a particle initially at Xq passing 
through a sequence of n celis ii, . . . ,in at times tj = jAt is 

'•(n+l)Ax j·{i„+l)Ax 



. . . = / dxi--- dxn\{p{xj,jAt\xj-i, {j - l)Aí) 



1 i·{h + l)Ax r-{in + l)Ax 

, ^ , , ,^ / dxi ■ ■ ■ dxn exp 



h DAt 



■ (3.4) 
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The history space entropy for the continuous case has exactly the same form as ( |2.17| ) 



the discrete case, but it is convenient to make use of a dimensionally invariant form 
of the entropy, by subtracting a dimensional factor log2 V^^. Thus, the log2 Tr(PQ,) 
term in (p^.lT]) becomes log2[Tr(PQ)/Tr(I)]. Rather than being an integer, as the 



discrete case, it is a continuous measure of the coarse-graining of each history. For 
the coarse-graining described above, with intervals of size Ax and n = tf/At times, 
we get 

log2 [Tr(Pa)/Tr(I)] =n\og,{Ax/V) . (3.5) 

Ax/V < 1, so log2[Tr(PQ)/Tr(I)] < for all but maximally coarse-grained histories. 

There are [V/Ax)"' coarse-grained histories. The plog2P part of the entropy in 
(|2.17|) is maximized in the case when all the histories have equal probabilities. In this 

case, 

max^ (-í'alog2Pa) = nlog2(V^/Ax) , (3.6) 



Taking account of ( ^.5| ) we see that is the maximum of Shs so that it is strictly 
non-positive. This is different from usual definitions of entropy, which are logarithms 
of large numbers and hence always positive. However, what is important is the change 
in Shs under coarse-graining or refinement, not its absolute value. 

We can gain some insight by looking at the limiting behavior of Shs for different 
levels of coarse-graining. Consider first the coarse-grained limit where Ax V. As 



Ax becomes large compared to y Dtf, it becomes highly improbable that the particle 
will ever diffuse outside of a single celi i. Thus, in this limit, one history dominates 
with a probability p ^ 1 while the others are suppressed, p ~ 0, and the —J2p log2 P 
part of the entropy vanishes. At the same time, the term log2[Tr(PQ,)/Tr(I)] = 
log2(Aa;/l^) approaches as well, so this maximal coarse-graining in x leads to 
Shs 0; Shs is maximized by maximal coarse-graining in x. 

Let us go now to the opposite limit, where Ax <C V. We can now label the interval 
ij by the value Xj centered in that interval. The probability to go from to Xj is 
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The p log2 p term for a single history is then 

TlDAt) 



n 



. . . log2p(a;i, . . . = --logs ( ) . . . , 



E 



(3.8) 



Summing over all histories is the same as summing the above expression over all the 
Xj. These sums can be approximated by integrals, which are readily evaluated to 
yield 



TlDtf \ fi 

E i-Pa^og2Pa) ~ nlog2 ^ — - - -(loggn - 1) . (3.9) 



a 



Ax I 2 

Adding the expression for log2 Tr(PQ,)/Tr(I) from ( p.5|) gives for the entropy 



Shs ~ n\og, I ^^-^ - ^i^og.n- 1) < , (3.10) 

i.e., Shs approaches a constant negative value in the limit of small Ax for a fixed At. 

Suppose now that we hold Ax fixed and vary the coarse-graining in t. If we go 
to the maximum coarse-graining At = tf, we return to the case of alternatives at a 
single time. If the probability of the particle being in the interval i is Pi, the entropy 
is just 

Shs = -J2 Pi^^^2 Pi + '^og2ÍAx/V) , (3.11) 

i 

differing from the usual single-time entropy only by a constant. 
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Figure 2: History space entropy Shs for continuous diíFusion as a function 
of coarse-graining scales Ax and Aí, in bits. All partides begin atx — on 

a ID manifold oflength V = 20, and spread with diffusion constant D = 1 
through a fínest-grained net of N = 1024 times with minimal timestep 
■q — 0.01. The fínest-grained celi size is Ax — 0.1. We have subtracted oS 
the the maximum entropy N logg V to render our results invariant under 
dimensional rescaling and refínements in time; the maximum entropy is 
thus 0, and Shs is not boundcd bclow. Thesc results wcre produced by a 
Monte Cario simulation with 10,000 random trajectòries. 
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If instead we refine the description in time the result is quite diíferent. As the 
timestep Aí becomes small compared to Ax'^/D, the probability of a particle moving 
from one interval to another in a timestep becomes small as well. Beyond that point, 
refining the description of the system in time does not increase the actual number of 
alternative histories with non-zero probabilities. Thus, 



The log2 [Tr(PQ,)/Tr(I)] = nlog2{Ax/V) term, however, does change as we increase 
n. Because this term is negative, as we increase the fine-graining in t, the history 
space entropy Shs decreases without limit. 

In both X and t, the entropy is diminished by making the description more fine- 
grained. Thus, we expect the same behavior as in the simple random walk: the 
entropy Shs (and thus, all other measures of entropy for histories that we have con- 
sidered) will be minimized by the most fine-grained description. We performed a 
numerical calculation to generate the entropy plot in Figure 2. Note that the quali- 
tative behavior is exactly the same as in Figure 1. 



In the previous examples, we assumed an explicitly Markovian time-evolution. If 
we relax that assumption and suppose that the probability of a history p{xi, . . . , Xn) 
does not have the form p{xn\xn-i) ■ ■ ■ p{x2\xi)p{xi) are our conclusions affected? 

As a simple example of a non-Markovian process, consider a particle undergoing 
Brownian motion. In addition to inertia and dissipation, the particle is subjected to 
a stochastic force. We can write a stochastic diíferential equation for its motion in 
Itò form: 



p log2 p —>■ const. 



(3.12) 



C. Brownian motion 



dx — {p/m)dt , 



dp = —2Tpdt + ad^ , 



(3.13) 
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where is a stochastic differential variable with zero mean and variance dt, 

M{dO = 0, M(rfe^) = dt . (3.14) 



This stochastic equation corresponds to a Fokker-Planck equation for probability 
densities p{x, p, t) in phase space |jÏ3l : 



dp í P \ ^ ^ cP' p 

rft = - UJ aí^^'^^ + 2r-Mx,p) + y^(x,í>) . (3.15) 

We can enumerate a set of coarse-grained histories for Brownian motion just as 
we did for the continuous random walk, dividing up the range V into celis of size Ax 
and dividing the total time of the histories tj into n steps of Aí each. An individual 
coarse-grained history consists of all fine-grained histories which pass through a given 
set of intervals ii, . . . , i„ at times íj = jAí. 

Histories of xif) are not Markovian because of the existence of the inertia term 
~{p/m)dp/dx in ( p.l5|) . However, looked at over relatively long times Aí ^ l/F the 



inertia becomes unimportant, as dissipation dominates. On these long timescales, 



the system is well approximated by the continuous diffusion model ( p.3D with D = 
a^/Sr^m^. On very short timescales, by contrast, inertia dominates. The particle 
drifts at a near-constant velocity, only slightly deflected by dissipation and noise. 

We see that the same arguments we used in the case of continuous diffusion apply 
to this case with little modification. Fine-graining in t reduces the entropy without 
limit. Fine-graining in x is a little less clear, but a similar argument can be made. In 
the limit of fine-grained x, we can approximate the probability of a history as 

p{xi, ...,Xn) = (Ax/Qo)"/(a;i, ■■■,Xn) , (3.16) 

where /(xi, . . . , x„) is dimensionless, and Qo is a constant with units of length which 
depends on Aí but not Ax. We can replace the sum over all histories in ( ^.17| ) with 
n integrals over the Xj, and get 
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Shs = -7^ / dXi--- dXn f{Xi, ...,Xn) logg f{xi, ...,Xn) 




-nlog^{Ax/QQ)+nlog^{Ax/V) 



So + nlog^iQo/V) , 



(3.17) 



where So has no Ax dependence. Since the Ax dependence has dropped out com- 
pletely, we see that in this case as well the entropy approaches a constant as we 
fine-grain in x. 

In Figure 3 we show the numerical results for the entropy of coarse-grained histo- 
ries of the Brownian motion model as a function of coarse-graining in x and t. This 
graph clearly shows essentially the same behavior of Shs with coarse-graining as in 
Figures 1 and 2. 
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Figure 3: History space entropy, Shs, for Brownian motion as a function 
of coarsc-graining scales Ax and At, in bits. AU partides begin with 
{x,p) — (0, 0) on a ID manifold oflength V — 20, with dissipation 2r = 1, 
noise strength a — 1, and mass m — 1. The fínest-grained net of times and 
minimum celi size are as in Figure 2, and the same conventions are used 
here in displaying Shs- These results were produced by a Monte Cario 
simulation with 10,000 random trajectòries. 
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IV. THE SECOND LAW FOR HISTORIES 
A. The Increase of Entropies 



The familiar second law of thermodynamics concerns the behavior of the entropy 
of a fixed set of coarse-grained alternatives at a moment of time as this time is varied. 
We shall call such entropies "single-time entropies". 

If the value of a single-time entropy at some particular time íq is all that is known 
about a system, and if that value is much lower than the maximum (equilibrium) 
value, then that entropy will subsequently tend to increase for most dynamical laws 
of interest. If the dynamical law is time symmetric about íq, then the approach to 
equilibrium will also be symmetric about íq- However, it is not just this statistical 
tendency to approach equilibrium that is usually meant by the second law of ther- 
modynamics. Rather, it is the general increase in entropy of suitable coarse-grained 
descriptions of the universe since the big bang. In particular, what is meant is that, 
for the most part, certain entropies of presently isolated systems are increasing in 
the same direction of time. The time-asymmetric increase of these entropies of the 
universe arises from a cosmological initial condition at which those entropies were 
low. As Boltzmann put it "The second law of thermodynamics can be proved from 
the mechanical theory if one assumes that the present state of the universe . . . started 
to evolve from an improbable state" .. 

The entropies that are most useful in chemistry and physics are associated with 
quasiclassical coarse-grainings which fix the vàlues of averages over suitable volumes 
of densities of approximately conserved quantities such as energy, momentum, and 
abundances of chemical and nuclear species. Their utility arises from the approximate 
conservation. The small volumes over which the averages are taken reach local equi- 
librium on short time scales, leaving the approach to equilibrium between volumes 
to be described by phenomenological equations such as the Navier-Stokes equation 
over longer time scales. The single-time entropy of these coarse-grainings is low in 
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the early universe leading to a general tendency to increase. 

Statements of the second law often refer to the increase of "the" entropy as though 
there were only one possible coarse-grained description for which it holds. What is 
meant by "the" entropy is usually the single-time entropy of the alternatives defining 
the quasiclassical realm of everyday experience described above. However, we should 
expect the general increase of the entropy of any set of coarse-grained alternatives 
which is low in the initial moments of the universe. To give just one example, the 
single-time entropy of a set of quasiclassical alternatives {Pa} increases with time 
when conditioned on various other quasiclassical alternatives {Pp}- Indeed, such 
entropies 



are the ones of practical interest. The entropy of a gas inside a piston is the entropy 
of alternatives referring to the gas given the configuration of the piston. There are 
thus a variety of coarse-grainings and conditions for which the missing Information 
increases with time. 



Sets of alternative, coarse-grained histories provide more general coarse-grained 
descriptions of the universe than sets of coarse-grained alternatives at merely one 
time. The corresponding entropies of histories should also increase with time if they 
are low at the time of the system's initial condition. For example, consider a set of 
histories consisting of a series of alternatives • • • , {Pa-^} at a sequence of time 

íi, • • • , í„ giving a histories entropy 



If Shs is initially low, and these times are all translated forward by an amount T, we 
would expect 





B. The Increase in History Entropies 




(4.2) 
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Shs ({P:J, ín + T; ■ ■ • ; {PiJ, ti + t) (4.3) 

to increase with T. 

A proof of the second law even for single entropies exists only for highly idealized 
situations.0 That is partly because entropy does not monotonically increase but 
fluctuates about an increasing trend. We can therefore hardly expect a mathematical 
proof of the increase of ([4 .31 ) with T. However, the connection of Shs with the step- 
by-step entropy supports this in the foUowing way. 

Consider histories consisting of alternatives at just two times ti and t2- Then from 
( P:77| ) and ( F^ 

ai 

+ S ({Pi},íi) +const . (4.4) 

where the constant is independent of ti,Í2 and the alternatives. As ti increases, the 
second term in (|4.4| ) increases. That is just the usual second law. The first term can 
also be expected to increase as both íi and ^2 niove away from a low entropy initial 
condition, provided P^^ is sufíiciently coarse-grained that the initial condition plays 
an important role in determining future probabilities. 

The sequence of times necessary to specify a set of histories presents a variety 
of possibilities for investigating the change in entropy. We have already discussed a 
uniform translation of all the times. However, we could also discuss increasing the 
separation between the times. For example, in the two time case of (|4.4| ), Shs increases 
as ti is fixed and Í2 — increases. Indeed, that is just a special case of the usual 
second law [cf. ( [4.1| )]. 



^See, e.g. p| 
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C. The Urn Model 



An exactly soluble model whicli nicely illustrates tlie increase in history space en- 



tropy is tlie urn model of P. and T. Ehrenfest [|16[. The model concerns 2R numbered 
balls, each of which is in one of two urns, A or B. The system evolves through 
discrete time steps. At each time a number from 1 to 2R is chosen and that ball is 
moved from its present urn to the other. Fine-grained histories are specified by giving 
the urn containing each ball at each of the A^ times. A simple kind of coarse-grained 
history specifies the number of balls in one urn, say A, at one time t. The kind of 
multi-time, coarse-grained histories we shall study are specified by giving the number 
of balls in A, (ni, ■ ■ ■ , n„) at a sequence of the A^ times íi, ■ ■ ■ , í^- 

The probabilities relevant for constructing the entropies can be worked out [Ï^JÏ^I • 
The probability of a transition from one time to the next is: 

2R — TL ' ïl ' 

P{nj+1, tj+l\nj, tj) = ^ '^n,+i,n,+l + ^ín,+i,n,-l • (4.5) 

Given that the number of balls in urn A is uq at time to, the probability that A will 
contain rij balls at time tj is: 

p(n,,t,|no,to) = (-1)^2-2^ fi il/RyCl^C^~r , (4-6) 

l=-R 

where the coefíicients C[ are defined by the identity 

2R 

{1- zf-'{l + zf+' = . (4.7) 

k=0 



All the rest of the probabilities we shall need are easily constructed from ( |4.5|) and 



Consider, by way of example, the history space entropy for the set of histories 
specified by giving the number of balls in A at two times tj and íj+m assuming an 
initial condition in which no balls are in A at to. We call these "two time histories" 



for short. From (|2.17|) this is 
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Shs{{nj+m,nj}) = - p {rij+rn, nj\no) \og2 p{nj+rn,nj\no) 

+ E PK·+m,n>o) log2 [^^ ) f!,^ 

+ (iV - 2) log2 (2^^) . (4.8) 

The probability p{nj+m, nj\no) is obtained by multiplying ( [4 .61) by a factor of ([4 .81) for 
each of the m times between tj and tj+m and summing over the intermediate vàlues 
of Uk, j < k < m. There are 2^^ ways of arranging the balls among the urns at each 
time so that, and a binomial coefficient gives the number of arrangements of balls in 
which n are in urn A. Thus, 

Tr(/) = 22« , Tr(P„)= (2^) . (4.9) 



Two time entropy — 
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Figure 4: History space entropy, Shs, for two time histories of the Ehren- 
fest urn model as a function of ti and m = t2 — ti, in bits. In the case 
shown there are 2R = 30 numbered balls distributed between urns A and 
B, with all the balls initially in urn A. In Figures 4 and 5 we have set the 
total number of fine-grained times arbitrarily at N = 3; a larger, morè 
realistic number would merely add a constant displacement to Shs- 

It takes of order 2R time steps to share information among the 2R balls, and 
that is the order of characteristic relaxation time for entropies to increase to their 
maximum value [|r^. This is the case for the entropies of two time histories as ti 



and Í2 are increased keeping their difference constant; this was suggested by 
and shown by Figure 4. The relation ( |4.4| ) shows that the maximum value (not 
including the neglected times) is roughly twice the maximum entropy for single time 
coarse-grainings of this type. 

This relation also indicates that Shs should grow with the same characteristic 
relaxation time as t2 — íi is increased, keeping ti fixed. The increase comes from 
the first term in ( ^.4| ). Again, the maximum value reached lies between one and two 
times the maximum for single-time coarse-grainings by the number of balls in one 
urn. This behavior is also evident in Figure 4 (though for large ti the increase is 
almost saturated at the initial time). 

Increasing the number of times included in each history is a fine-graining. At 
a given value of ti, the entropy should decrease as more times are included. This 
behavior is illustrated in Figure 5 for 2R = hq = 30. This shows the behavior of one, 
two, and three time history space entropies as a function of ti where t2 = ti + 1, and 
Í3 = íi + 2. All the entropies increase to maximum vàlues on roughly the time scale 
R. Asymptotically from ([4.4|) , the entropies behave like 



Si{ti)-c (4.10) 
where Si is the single-time entropy and c is independent of íi for the urn model but 
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depends on the number of times and the vàlues of the time differences. 

T 



One time — 
Two times - 
Three times 

I I 

5 10 15 20 25 30 

Figure 5: History space entropy, Shs, for one, two, and three time his- 
tories of the Ehrenfest urn model versus the fírst specifíed time ti, in 
bits. The times of the two and three time histories are separated by sin- 
gle timesteps. The parameters and initial conditions are the same as in 
Figure 4. 

V. QUANTUM HISTORY SPACE ENTROPY 

Isham and Linden posited their family of entropies (|2.23| ) on the basis of the 
property that they decrease under fine-graining. We were able to show that the 
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classical analogs could be derived from a Jaynes construction for the In 
this section we show that quantum history space entropy can be similarly derived as 
a preliminary to a more general discussion of its connection with other entropies.^ 

Consider a set of decoherent alternative histories {cq,}, each history with a prob- 
abihty Pa and represented in history space by a projector Pq,- Define an entropy 
functional on history space operators W by 



S{W) = -Tr (W log2 Wj . (5.1) 

Then maximize 5(W) over all W for which (|5.1| ) is real, subject to the condition that 

Tr (P,W) = p„ . (5.2) 

The result is that the maximum is given by 

W = y , (5.3) 

^ Tr [P„] ^ ^ 

and the entropy is: 

Shs{{ca}) = -X^PalogaPa + XlP"log2Tr(Pa) ■ (5.4) 
analogous to ( p.lTp . 

The Jaynes construction immediately makes clear why the x = 1 history space 
entropy decrease on fine-graining. There are more conditions constraining the max- 
imum in ( |2.14|) in a fine-graining of a set than in the set itself. The maximum can 
therefore only be lower. For other vàlues of x it is sufíicient to note that 

4 ({ca}) = Shs ({cj) - Tr (I) + {x-l) Y.Po.'^og, [Tr (P,)] . (5.5) 

This too decreases with fine-graining, as follows from the result for Ji and the con- 
vexity of the logarithm. 



^The authors have benefited from many discussions with M. Gell-Mann on this issue. 
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Thus, history space entropy can be given a unified construction through a Jaynes 
procedure both classically and quantum mechanically. What can be done classically 
but not quantum mechanically is to express the probabilities for all decoherent his- 
tories in the form 

Pa = Tr (P„W) (5.6) 

for one positive operator W, independent of the set of alternatives. There is no 
quantum mechanical analog of (|2.9|) . Were there one, quantum mechanics would be 
equivalent to a classical stochastic theory. It is possible to find history space operators 
W which reproduce the probabilities Pa through (|5.6| ) for any decoherent set. For 
example, vàlid expressions for the probabilities of decoherent histories like 



p, = Tr (P„"Jí„)···P,\(íi)p) (5.7) 
can be transcribed into history space using the identity |T2 



TiH {Al--- Ar,) = Tr^.« [(Al ® ■ ■ ■ ® A,) 5] (5.8) 



where 



5" \vi) ® ■ ■ ■ ®| -Ufe) = \vk) ® \vi) (g) ■ ■ ■ ®| Vk-i) ■ (5.9) 

However, the resulting W's are not positive, even when they can be arranged to be 
Hermitean. For this reason, even though quantum analogs of Sdc{{ca}) and Sic{{ca}) 
can be defined, the derivations of the inequalities relating them to Shs{{ca}) like 
( |2.39| ) do not immediately generalize to quantum mechanics. 

VI. CONCLUSIONS 

Information is contained not only in sets of alternatives at a single moment of time, 
but more generally in sets of alternative histories — sequences of sets of alternatives at 
a series of times. A variety of measures of the Information in histories are available. In 
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this paper we have provided a unified construction of all of these through the Jaynes 
procedure. It foUows from these constructions that these entropies decrease under 
fine-graining and increase under coarse-graining. We illustrated this in a few simple 
models. 

We expect entropies for histories to share other common properties analogous to 
the usual second law of thermodynamics. In particular, the entropy of a set of histories 
should increase as that set is translated forward in time away from a low entropy initial 
condition. We illustrated this with the classical urn model, but expect it to hold for 
more realistic dynamical laws, both classically and quantum mechanically. 

General sets of alternative coarse-grained histories will not exhibit deterministic 
correlations in time in a classical stochastic theory. However, sufïiciently coarse- 
grained sets of histories may exhibit deterministic behavior. For example, the unpre- 
dictable motion of single àtoms yields nearly deterministic laws for the hydrodynamic 
variables of pressure, temperature, and density. Characterizing the level of determin- 
ism is an interesting question related to the search for measures of classicality in 
quantum theory. It is clear from our discussion that no entropy of histories is a 
measure of determinism. Entropy is reduced by fine-graining, and the finest-grained 
histories are not deterministic. In quantum theory we can, therefore, not expect an 
entropy of of histories, by itself, to be a measure of classicality. 
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